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ABSTRACT 

In this paper, the perceptual quality difference between scal- 
able and single-layer videos coded at the same spatial, tempo- 
ral and amplitude resolution (STAR) is investigated through a 
subjective test using a mobile platform. Three source videos 
are considered and for each source video single-layer and 
scalable video are compared at 9 different STARs. We uti- 
lize paired comparison methods with and without tie option. 
Results collected from 10 subjects in the without "tie" op- 
tion and 6 subjects in the with "tie" option show that there is 
no significant quality difference between scalable and single- 
layer video when coded at the same STAR. An analysis of 
variance (ANOVA) test is also performed to further confirm 
the finding. 

Index Terms — Perceptual video quality, paired compar- 
ison, scalable video 

1. INTRODUCTION 

Scalable video coding with spatial, temporal and ampli- 
tude scalability offers video servers and clients the flexibility 
in choosing appropriate video layers according to the net- 
work bandwidth and the user perference. Given a bandwidth 
constraint, the spatial resolution (controlled by frame size), 
temporal resolution (controlled by frame rate) and ampli- 
tude resolution (controlled by quantization parameter), can 
be adjusted such that the optimal perceptual quality can be 
achieved. However, scalable coding has not been widely 
adopted in commercial applications so far because of the 
complexity of scalable coding and the reduced coding effi- 
ciency compared to single layer coding. Most of the existing 
video streaming architectures uses multiple copies of single 
layer coded videos at different STAR'S, and the system will 
send a version coded at a particular STAR based on the net- 
work condition. It is interesting and useful to see whether 
there are any quality differences between single-layer and 
scalable videos coded at the same STAR. In (TJ (2), we have 
investigated the impact of STAR on the perceptual quality, 



and derived a model relating the perceptual quality with the 
STAR. It will be interesting to see whether the same model is 
also applicable to non-scalable video. 

In this work, we report results from subjective tests that 
compare the perceived quality between single-layer and scal- 
able video, when coded at the same STAR combination. We 
design our subjective tests based on the paired comparison 
methods pj. We conduct the test on a mobile platform with 
a 4.1-inch WVGA (854x480) touch screen running the An- 
droid OS. The remainder of this paper is organized as follows: 
Section|2]introduces the test interface, the test video pool and 
test methodology. Section [3] shows and analyzes the subjec- 
tive test result. We conclude this work in Section [4] 

2. TESTING INTERFACE AND METHODOLOGY 

2.1. Testing interface 

The subjective tests are conducted on the TI's Zoom2 mo- 
bile development platform equipped with a 4.1 -inch WVGA 
multi-touch screen. Our approach for designing the interface 
is using the Android's own video playback library (Android 
SDK), while using Java and XML to control the high-level 
program flow. For details on the user interface design, please 
see QUI). 

2.2. Test video pool 

Three videos, city, soccer, and foreman from the standard test 
video databas^] are used in the test. All videos are origi- 
nally at 4CIF (704x576) spatial resolution with a frame rate 
of 30Hz, and each sequence is 8-second long (240 frames). 
A sine-windowed sine function, which is the recommended 
downsampling filter in H.264/SVC standard (5), is used for 
generating videos at spatial resolutions of CIF and QCIF. The 
JSVM 9.18 1 6] encoder is used to generate both single-layer 
and layered video. The GOP size is 16 frames in all cases. 

We investigate the effect of scalable coding in each di- 
mension (i.e. spatial, temporal, or amplitude scalability) sep- 
arately, while fixing the resolutions of the other two dimen- 
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Available 



ftp://ftp.tnt.uni-hannover.de/pub/svc/testsequences/ 




(a) City @4CIF/QP36/30Hz/53rd Frame (b) Foreman @CIF/QP2 8/3 0Hz/ 164th Frame (c) Soccer @4CIF/QP28/15Hz/75th Frame 

Fig. 1. Comparison snapshots for single-layer and scalable videos at same STAR. 



sions at the highest. Specifically, to examine the effect of spa- 
tial scalability, we code all videos at highest temporal and 
amplitude resolution (FR=30Hz, QP=28). To create non- 
scalable video at different SR's, we code pre-downsampled 
input videos at QCIF, CIF, and 4CIF resolutions separately, 
using the JSVM encoder at the single layer mode. To cre- 
ate scalable videos, we create a three layer bitsteam using 
the JSVM encoder invoking only the spatial scalability, using 
QCIF as the base layer. For temporal scalability, we fix SR at 
4CIF and QP at 28, and produce temporally scalable videos 
by using the JSVM encoder with the hierarchical B temporal 
prediction structure, with the base layer corresponding to 3.75 
Hz, and additional enhancement layers leading to 7.5, 15, and 
30 Hz, respectively. The non-scalable versions are created by 
coding the pre-downsampled input video at 7.5, 15, and 30 
Hz video at the non-scalable mode using the I15BP structure, 
with only the first frame coded at I mode. Thus, the temporal 
scalable videos at lower frame rate have higher I/P-frame ratio 
then the corresponding single-layer videos. No QP cascading 
is used when temporal and spatial scalability is invoked. Fi- 
nally, to test amplitude scalability (commonly known as qual- 
ity or SNR scalability), Coarse Gratitude Scalability (CGS) 
is used with base layer QP at 44, additional layers using QP 
at 36 and 28, respectively. For single-layer counterpart, we 
directly code the video at each QP (to be specific, QP at 28, 
36 and 44 individually). Table [T] summarizes the test points 
examined in different cases. 



Table 1. Test points 



Common parameters 


Test parameters 


QP28, 30Hz 


4CIF, CIF and QCIF 


4CIF, 30Hz 


QP28, QP36 and QP44 


QP28, 4CIF 


30Hz, 15Hzand7.5Hz 



The coded bitstream are then extracted and decoded into 
YUV format, and for CIF and QCIF streams, a 6-tap half- 
pel with bilinear quarter-pel interpolation filter [ 7 ] is used to 
upsample it to 4CIF for display in the Zoom2 screen. Finally, 
single layer and scalable layer videos coded at same STAR are 
catenated in both ways (single-layer first shown, and scalable- 
layer first shown) with a 3 -second grey (R = G = B = 192) 
out interval in between. 



2.3. Methodology 

To exam whether there is perceptual difference between 
single-layer and scalable video coded at the same STAR, 
the paired comparison method (5) is used. In paired compar- 
ison, a subject views two consecutive videos with a grey-out 
interval, and then is asked to rate which video is better in 
terms of perceived quality. There are two approaches in 
designing subjective tests using paired comparison: 2-forced- 
choice without "tie" option and 3 -forced-choice with "tie" 
option. In this work, we conduct our subjective tests using 
both methods. Please remind that for the 2-forced-choice 
without "tie " test, it is similar to the methodology used in the 
just noticeable difference or jnd test. Here when we count the 
votes, we are using the 75% jnd criteria. 

The subject will view a catenated video from a randomly 
generated ordering (either single-layer first or scalable first), 
and after that on the without "tie" test, he/she will choose 
which one (the first one or the second one) has a better qual- 
ity; on the with "tie " test, he/she will have the possibility to 
choose the "tie" option if he/she feels the perceived quality 
is the same for both. The subject can replay the current pair 
as many times as he/she wishes before rating. For each pair 
of videos in a particular STAR combination, two occurrences 
are shown, and the order of which one (single-layer or scal- 
able) shown first is random and determined by the interface. 
The subject will have to give the opinions (forced choice) on 
all test points for the session, the total number of test points 
is 27(3 x 3 x 3). Note that with double rating, each subject is 
viewing and rating 54 nineteen-second sequences. 

3. RESULT AND ANALYSIS 

Ten subjects with normal vision participated the 2-forced- 
choice test, 6 subjects with normal vision participated the 3- 
forced-choice test. The votes are counted for single-layer and 
for scalable videos for each test point, respectively. 

To provide a intuitive feeling of the PVS, in Fig. [T] we 
show a set of snapshots of encoded scalable and single-layer 
videos at the same STAR, and there corresponding scaled ab- 
solute difference images. We can see each pair of videos per- 
ceptually look very similar, although there are non-zero pixel 
differences. 



Table 2. Votes for 2-forced choice without "tie " option tests 





city 


soccer 


foreman 


All videos 






Scalable 




Scalable 




Scalable 




Scalable 


4CIF 


13 


7 


8 


12 


8 


12 


29 


31 


CIF 


8 


12 


14 


6 


12 


8 


34 


26 


QCIF 


9 


11 


10 


10 


11 


9 


30 


30 


A11S 


30 


30 


32 


28 


31 


29 


93 


87 


30Hz 


9 


11 


12 


8 


10 


10 


31 


29 


15Hz 


13 


7 


9 


11 


11 


9 


33 


27 


7.5Hz 


6 


14 


7 


13 


12 


8 


25 


35 


AllT 


28 


32 


28 


32 


33 


27 


89 


91 


QP28 


10 


10 


11 


9 


10 


10 


31 


29 


QP36 


8 


12 


13 


7 


12 


8 


33 


27 


QP44 


5 


15 


11 


9 


14 


6 


30 


30 


AllQ 


23 


37 


35 


25 


36 


24 


94 


86 



Table 4. Votes for 3-forced choice with "tie" option tests 





city 


soccer 




foreman 


All videos 




Single 


Scalable 


Tie 


Single 


Scalable 


Tie 


Single 


Scalable 


Tie 


Single 


Scalable 


Tie 


4CIF 


1 


1 


10 





2 


10 


1 


3 


8 


2 


6 


28 


CIF 


1 


2 


9 


2 


1 


9 





2 


10 


3 


5 


28 


QCIF 


2 


2 


8 


1 





11 








12 


3 


2 


31 


A11S 


4 


5 


27 


3 


3 


30 


1 


5 


30 


8 


13 


87 


30Hz 


2 


2 


8 


1 


3 


8 


2 


1 


9 


5 


6 


25 


15Hz 


3 


1 


8 


2 


3 


7 


3 


2 


7 


8 


6 


22 


7.5Hz 


2 


2 


8 


1 


1 


10 


2 


3 


7 


5 


6 


25 


AllT 


7 


5 


24 


4 


7 


25 


7 


6 


23 


18 


18 


72 


QP28 


1 


2 


9 


2 


3 


7 


2 


2 


8 


5 


7 


24 


QP36 


2 


2 


8 


1 


3 


8 





1 


11 


3 


6 


27 


QP44 


1 


1 


10 


2 


2 


8 


1 





11 


4 


3 


29 


AllQ 


4 


5 


27 


5 


8 


23 


3 


3 


30 


12 


16 


80 



Table 3. /7-value and /-value of ANOVA test for without 
"tie " test 





/?-value 


/-value 


Spatial 


0.5458 


0.38 


Temporal 


0.5549 


0.36 


Amplitude 


0.4946 


0.49 



Table [2] provides the counting result for the 2-forced- 
choice test. As we mentioned in Section |2.3[ the 2-forced- 
choice test can be seen as a special case of JND test. If 
the hypothesis that there is a "just noticeable" difference on 
the perceived quality is accepted, the winning frequency for 
the better quality one should be at least above 75% under 
the 75% jnd condition, that is at least 15 votes for a par- 
ticular video at a particular STAR combination, since each 
video pair is viewed 20 times. From Table [2] except city at 
QP44/30Hz/4CIF, there is no such occurrence. Thus it's safe 
to say that there is no significant difference in the perceptual 
quality between the scalable and single-layer video at all 
STAR'S examined. 

To further examine the statistical significance of the rating 
differences, we conducted an ANOVA analysis in the three 
dimensions separately and the results are shown in Table [3] 
For all the cases, the /^-values are larger than 0.05, indicat- 
ing that there are no significant differences between videos 
coded in single-layer and scalable modes. We also show the 
box plots of the ANOVA tests in Fig. [3] In the box plots, 
the central red mark is the median of the data, the notches in 



the box represent the 95% confidence interval of the median, 
the edges of the box are the 25th and 75th percentiles and the 
whiskers extend to the most extreme data points. We find that 
the 95% confidence interval of medians are overlapped, indi- 
cating there is no perceived quality difference between single- 
layer and scalable coded videos. 

Table [4] shows the counting result for the 3 -forced-choice 
test. We see that in most cases, the majority of votes are given 
to the "tie" option, indicating the viewers could not tell the 
difference between the single-layer and scalable coded video 
at the same STAR. 

4. CONCLUSION 

This paper reports results from a perceptual quality assess- 
ment comparing single-layer video and scalable video, when 
coded at the same spatial, temporal and amplitude resolu- 
tions (STAR). The subjective test was conducted using paired 
comparison with and without "tie" option and double rating. 
Ten subjects' data were collected for the without "tie" op- 
tion, and 6 subjects' ratings for the with "tie" option. The 
test result shows that under the same STAR there is no sig- 
nificant perceptual quality difference between single layer 
coded video and scalable one, both by observing the ratings 
and through using the ANOVA test. Although the single- 
layer and scalable videos are generated using the H.264/AVC 
and H.264/SVC compliant codecs, respectively (both imple- 
mented via the JSVM encoder under different settings), we 
believe the conclusion may be generally true for any videos 



ANOVA for spatial resolution 



ANOVA for temporal resolution 



ANOVA for amplitude resolution 




Fig. 2. Box plots for the ANOVA tests, in x-axis, 1 indicates 
single layer video and 2 indicates scalable video. 




Foreman scaled absolute difference map 




coded at the same STAR, regardless the encoding method. 
Note here we measure the amplitude resolution by the inverse 
of quantization stepsize. We consider the two videos as hav- 
ing the same amplitude resolution if they are quantized using 
the same type of quantizer and at the same quantization step- 
size. One important consequence of our finding here is that 
the Q-STAR model developed in our prior work [ 2 ] modeling 
the perceptual quality as a function of STAR is applicable to 
both scalable and non- scalable video. 
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(a) MAD= 1.9299 (b) MAD=0.9875 

Soccer scaled absolute difference map 




(c) MAD=1.2610 

Fig. 3. Scaled absolute difference maps for three videos 



